专利摘要:
A server (100) comprising a plurality of modules (1-8), each module comprising - a communication element (16, 26); a plurality of CPU processors (10, 11, 20, 21); a system on a SOC chip (12, 22) running firmware; - a programmable gate array (FPGA) (13, 23); the modules being interconnected by an interconnection (27) between each communication element; an interconnection (28) between each system on a SOC chip; the executed firmware performing two software components - a satellite controller component for managing the SMC system (15, 25); a management controller component of the base card BMC (14, 24).
公开号:FR3025333A1
申请号:FR1401900
申请日:2014-08-26
公开日:2016-03-04
发明作者:Claude Brassac;Georges Lecourtier
申请人:Bull SA;
IPC主号:
专利说明:

[0001] The present invention relates to the field of information processing systems and / or communication. More particularly, the invention proposes the realization of a multi-module server and the implementation of the functions associated with it. Among the computer server architectures, multiprocessor network servers known as SMP, acronym for "Symetric Multi Processor", are typically made by a plurality of integrated circuits, commonly called sockets. For these servers, the sockets are the processors of the system, these integrated circuits being connected by a very high speed and very low latency interconnection bus, allowing the realization of a shared memory. Subsequently, "interconnection" means a physical and logical link connecting two connection interfaces. Generally, for electrical performance reasons, each socket directly supports a portion of the memory components of the SMP server. All of this memory is then made coherent by means of memory controllers, making it possible automatically to guarantee that any data modified by one of the processors is then visible by all the others. For physical reasons, an SMP server with a large number of sockets must then be divided into several subsystems. These subsystems can, for example, be made by daughter boards connected to a motherboard, blades connected to a backplane distributing the power supplies and the bus signals, or self-powered voltage modules and current by the sector. This is called a modular Mufti-modular server. Regardless of the architecture of an SMP server, it requires hardware administration tools based on standards and / or standards. Thus, the Intelligent Platform Management Interface (IPMI) provides a set of interface specifications to manage, i.e., supervise and control, the interface. physical state of certain components or electronic equipment in computer equipment. The IPMI standard allows, by way of example, to monitor 3025333 2 in a server temperature regulation, voltage, power supply of the microprocessors, their proper power up, the humidity level of the components, or the speed cooling fans. All the functions of the IPMI are generally activatable via a basic input / output system called BIOS (acronym for "Basic Input / Output System") or via management software provided by the manufacturer of the device. equipment. Thus, if necessary, for example when raising an alert associated with an event (eg overheating) of a device, the administrator of a network is capable via an appropriate graphical interface and from the same take place (eg on a local or remote machine), electrically shut down the equipment concerned, restart it, reset its parameters if necessary, or switch on alternative equipment. For hardware equipment, the supervision of the physical state of components or electronic equipment is commonly provided by a management controller of the basic board called BMC (acronym for "Baseboard Management Controler"). ), implemented on the motherboard, or on the main board of the material to be supervised. By way of example, for a server, the functions of the BMC are realized by the integration on the motherboard of the server of a system on a SOC chip (acronym for "System On Chip"), on which is executed a firmware (known as "firmware") implementing the IPMI standard. However, the various existing implementations only respond to the current problem of a single motherboard server. In the case of a server comprising a plurality of modules connected via logistic data buses to an interconnection component, for example connected according to the I2C standard (acronym for "Inter-Integrated Circuit"), the use of a BMC is quickly limited. The number of sensors increases proportionally with the number of interconnected modules, while the IPMI standard is limited to a predefined number of authorized sensors. In addition, the increase in the number of modules requires the extension of logistical data buses. Such an extension, in the case for example of the 12C, is unreliable, lack of performance and flexibility.
[0002] 3025333 3 In order to overcome the problems of IPMI support, one solution is to turn to proprietary management standards. Such a solution is nevertheless of little interest, if it is desired to continue to guarantee both interoperability and possibilities of integrating the servers with existing data management systems, these systems using standard management interfaces. It is therefore necessary to continue to support the IPMI standard, and more generally standard management interfaces, for multi-module computer equipment, in order to guarantee inter-equipment compatibility. The present invention aims to overcome the aforementioned drawbacks. A first objective is to propose a multi-module server architecture. A second goal is to provide a multi-module server with a number of expandable modules and software compatibility with standard management interfaces. A third objective is to propose a compatible firmware 20 with standard management interfaces, such as the IPMI standard, and allowing the management of any constituent module of a multimodule server. For this purpose, there is provided a server comprising a plurality of modules, each module comprising a communication element capable of ensuring the coherence of a shared memory between the modules; a plurality of CPUs connected to each other and connected to the communication element; a system on a SOC chip connected to the plurality of CPUs and the communication element, the system on a SOC chip executing a firmware; a programmable gate array (FPGA) connected to the system on a SOC chip, to the communication element and to the plurality of CPUs; The modules being interconnected by an interconnection between each communication element via an XQPI network; an interconnection between each system on a SOC chip via a private Ethernet protocol network, encapsulating a communication protocol in accordance with the IPMB standard; the firmware executed on each system on a SOC chip of each module producing two software components, namely: a satellite system management controller component (SMC) able to measure the physical parameters of its module and manage the local functions of that module; module; a base card management controller (BMC) component capable of supervising all the SMCs of all the modules, managing all the functions of the server centrally and exchanging data with each of the BMC components via 15 L interconnection between each system on a SOC chip. Advantageously, in this server, the interconnection of the communication elements and the interconnection of the SOCs are carried out via an interconnection box, the interconnection box comprising a programmable gate array (FPGA).
[0003] Advantageously, in this server, the interconnection box is electrically powered by the modules, each module comprising at least two PSU power supplies, the PSU power supplies being dimensioned with a 2N redundancy. Advantageously, in this server, one of the functions managed by each BMC component is a function allowing the instantiation of the modules, this function being carried out as follows: sending an ID identifier request by the FPGA of each module to the interconnect box via the system on a SOC chip; 30 - the sending by the FPGA of the interconnection box of a unique identifier ID to the FPGA in response to the identifier request; the determination by the component BMC of the address of its module and the addresses of the modules to which it is interconnected, the determination of these addresses being carried out according to the identifier ID received by the FPGA.
[0004] Advantageously, in this server, the identifier ID sent by the FPGA of the interconnection box is determined as a function of each physical connection location of each of the modules to the interconnection module.
[0005] Advantageously, in this server, each of the modules comprises a clock signal generator capable of synchronizing the CPUs of its module; its FPGA programmed to partition and group into subsets the modules according to common characteristics.
[0006] Advantageously, in this server, each BMC component is programmed to: identify in a set or a subset of modules its belonging to a master or slave module, according to identification information of each of the modules of this set or subset; if it belongs to a master module, configure its FPGA, so that the FPGA distributes the clock signal from the master module to the sets; 20 - if it belongs to the same set OR under a slave module, configure its FPGA slave modules so that the FPGA disables the clock signal of the slave module. Advantageously, in this server, each CPU of each module comprises timestamp counters TSC, able to synchronize tasks comprising a plurality of light processes, the synchronization of all the timestamp counters TSC of these CPU processors in a module assembly or subassembly being produced by: sending by each BMC component of each slave module of this set or subassembly, a notification to the BMC of the master module, when the slave module exits from an initialization or reset phase; a notification to the BMC of the master module, when the master module leaves an initialization or reset phase; The sending by the BMC of the master module, when all the master and slave modules have come out of an initialization or reset phase, of a synchronization signal able to reset all the counters of TSC time stamping CPU processors for all modules in the same set or subset. Advantageously, in this server, each FPGA of each module comprises timestamp counters TSC, able at any occurrence of error in its module to record information relating to the error including at least the time stamp of the error , the synchronization of the set of timestamp counters TSC of each FPGA in a set or a module subset being carried out by: sending each BMC component of each slave module of this set or subassembly a notification to the BMC of the master module when the slave module exits an initialization or reset phase; A notification to the BMC of the master module, when the master module leaves an initialization or reset phase; the sending by the BMC of the master module, when all the master and slave modules have come out of an initialization or reset phase, of a synchronization signal able to reset all the timestamp counters TSC. FPGAs for all modules in the same set or subset. Advantageously, in this server, the identification of an error in a set or subset of modules is performed by a step of comparing the error information recorded by all the timestamp counters TSC FPGA of this set or subset, the BMCs being able to exchange and share the error information in the same set or subset of modules. Other objects and advantages of the invention will become apparent from the description of embodiments, given hereinafter with reference to the accompanying drawings in which: - Figure 1 is a representation of a server comprising a plurality of modules according to one embodiment; FIG. 2 is a representation of the interconnection of eight modules with an interconnect box in a server according to one embodiment.
[0007] FIG. 1 shows an embodiment of a server 100 comprising a plurality of modules 1, 2, 3, three in this example, such as motherboards or printed circuit boards. Nevertheless, an entirely different number of modules can be integrated in the server 100, for example eight. Advantageously, each of the modules 1, 2, 3 is identical and comprises the following elements: one or more CPU processors 10, 11, 20, 21 (acronym for "Central Processing Unit"). In the illustrated example, each module 1.2 is bi-socket, that is to say it supports respectively two CPU processors 10, 11, 20, 21. Advantageously, the CPUs 10, 11, 20 , 21 of the same module 1, 2, 3 are interconnected via a QPI link 101, 201, acronym for "QuickPath lnterconnect"; A system on a SOC chip 12, 22, for example a microcontroller, executing a management firmware. In one embodiment, the firmware comprises two processes executed in parallel, and capable of exchanging information, each respectively logically performing the function of a component: a BMC component 14, 24, and a satellite controller component of the controller. system subsequently referred to as SMC component 15, 25, acronym for "Satellite Management Controller". Advantageously, the functions of the BMC components 14, 24 and SMC 15, 25 are defined in the IPMI standard. In one embodiment, the SOC 12, 22 is able to exchange data with CPU processors 10, 11, 20, 21 via a "Platform Environment Control Interface", commonly referred to as the PECI interface. 102, 202, acronym for "Platform Environment Control Interface"; a component "programmable gate network", commonly referred to as FPGA 13, 23 (acronym for "Field-Programmable Gate Array"), allowing the transmission of signals to any component with which it is interfaced, for example clock signals, errors and / or reset. Advantageously, the FPGA 13, 23 comprises a link 103, 203 capable of exchanging data with the CPU processors 10, 11, 20, 21 and a link 104, 204 able to exchange data with the SOC 12, 22; memories such as caches associated with processors CPU 10, 11, 20, 21; a communication element 16, 26 comprising a link 105, 205 with each of the CPU processors 10, 11, 20, 21, able to ensure the coherence of a "global" memory shared between the CPU processors 10, 11, 20, 21 of each module 1, 2, 3. In one embodiment, the communication element 16, 26 is a BCS-type component, which is a solution proposed by the applicant and available to the "Bull Coherent Switch". commercially available, or any later version thereof such as BCS2 ("Bull Coherent Switch 2"). The communication element 16, 26 is furthermore able to exchange data with the FPGA 13, 23 via the link 103, and with the SOC 12, 22 via a bus 106, 206 according to the I2C standard. Advantageously, the modules 1, 2, 3 of the server 100 are interconnected by links at high speeds, realizing the following interconnections: an interconnection 27 of the communication elements 16, 26 of the modules 1, 2, 3 via an XQPI network, acronym English of "eXtended QuickPath Interconnect", which is an interconnect solution proposed by the applicant and available commercially. Advantageously, the interconnection 27 allows the communication elements 16, 26 such as BCS2 specific to the modules 1, 2, 3, to exchange information, thus ensuring the memory coherence of all the modules 1, 2, 3. Interconnection 27 via the XQPI network further enables the transport between each module 1, 2, 3 of synchronization signals, such as clock signals whose synchronization is managed by the FPGAs 13, 23; an interconnection 28 of the SOCs 12, 22 of the modules 1, 2, 3 via a private network in Ethernet protocol, encapsulating a communication protocol in accordance with the IPMB standard, acronym for "Intelligent Platform Management Bus". In the state of the art, the IPMB standard is used on communication buses, in accordance with the I2C standard. The use of the IPMB standard on a Local Area Network (LAN), as it is proposed here, has the advantages of improving the performance, reliability and flexibility of local networks. transmissions of data between each of the SOCs 12, 22 of the modules 1, 2, 3 of the server 100. It should be noted that each link of the interconnection type 28 comprises a bidirectional Ethernet link and a set of sideband signals, 10 of which SPI signals (for Serial Peripheral Interface). In one embodiment, the interconnection 27 of the communication elements 16, 26 and the interconnection 28 of the SOCs 12, 22 are realized via an interconnection box 30, illustrated in FIG. 2 and subsequently described. . For example, the interconnection box 30 is a "backplane", comprising a network switch, commonly referred to as the "switch" anglicism. Advantageously, each BMC component 14, 24 implemented on each SOC 12, 22 can be used as a data exchange interface, in particular between each BMC component 14, 24 of each module 1, 2, 3, as well as with external applications. IT asset management, like any standard management interface. By way of example, an external computer equipment management application used for the server 100 is in the form of a graphical interface proposed to an administrator, enabling him to supervise the physical parameters of a module 1, 2, 3 (eg measured temperatures, measured voltages, detection of the presence / absence of a specific interconnection cable); to communicate to the BMC component an appropriate action to be taken in the event of anomaly detection (eg switching on or off a module or restarting a module 1, 2, 3). To do this, the interconnection 28 of each SOC 12, 22 allows, from a logical point of view, the interconnection of each BMC component 14, 24 of each module 1, 2, 3. The components BMC 14, 24 of Each of the modules 1, 2, 3 is therefore able to exchange information between them, the pooling of their functions and their information making it possible to present the server 100, from the point of view of an external management application. equipment, such as a 100 mono-module server. Thus, each BMC component 14, 24 is programmed to handle the high-level, i.e., central, functions of the server 100, for example: managing the power-up of the server 100; the management of one or more SMC components 15, 25, for example the supervision of their states, the reception of measurements or any other data sent by SMC components 15, 25, or the transmission of control messages (piloting) to 10 SMC component destinations 15, 25; a partitioning of clock signals according to existing partitions between different modules 1, 2, 3. Subsequently, under the term partition in a multi-module system, a subset of modules 1, 2, 3, able to operate independently of other modules 1, 2, 3 of the same system. For example, a partition is a subset of modules 1, 2, 3 running the same operating system OS (operating system acronym), or running the same set of operating systems OS if the modules include layers of hypervisors. Advantageously, any partitioning method able to group one or more sets of modules 1, 2, 3 sharing common characteristics, for example running the same operating system OS, can be used; The synchronization and the broadcasting, that is to say the routing, of synchronization signals via interconnection 27 through the XQPI network, as a function of existing partitions between different modules 1, 2, 3. Each SMC component 15, 25 allows a management of the low level functions, that is to say local, of the module 1, 2, 3 on which it is implanted, and communicates to the BMC component 14, 24 of the same SOC 12 , 22 data that he supervises, for example measurements. By way of example, when the server 100 is powered up, each SMC component 15, 25 powers up its own module 1, 2, 3 according to a preconfigured process (eg sequencing, programming of its BCS2, control of voltages) and reports to the BMC component 14, 24 of the same SOC 12, 22 the state of the power-up of its module 1, 2, 3. In addition, each SMC component 15, 25 supervises the state of its Its own equipment, for example measures a given number of parameters of its own module 1, 2, 3 (eg: temperature, voltages) and transmits this information to the BMC component 14, 24 of the same SOC 12, 22. The BMC component 14, 24 is then able to make a decision based on the parameters that the SMC component 15, 25 communicates to it, for example restarting a module 1, 2, 3 especially in case of detection of an anomaly. Thus, the hardware components of each module 1, 2, 3 are managed by a local SMC component 15 at module 1, 2, 3, independently of the other modules 1, 2, 3, while each BMC component 15, 25 can handle a variable number of modules 1, 2, 3 via interconnection 28 through the IPMB over LAN. Advantageously, such an architecture is very flexible because it is independent of the number of modules 1, 2, 3. According to one embodiment, the physical topology of the Ethernet and SPI links is of star type. The side-band signals of clock type, CATERR or TSC_SYNC are in physical topology mesh (called, in English, "all-to-all"). According to various other embodiments, other topologies are possible, for example Clos network topologies or multidimensional toroids. Advantageously, the interconnection box 30 is made according to the chosen topology. By way of example, in the case of a server 100 to N modules, for an "all-to-all" topology, each interconnection port 25 of a module 1, 2, 3 must support N-1 links. high speed going to each of his neighbors. If in an eight-module configuration, where N = 8, each module 1, 2, 3 comprises seven unidirectional eight-way links, the interconnection box 30 is then constructed to support 8 * 7 * 8, ie 448 channels. unidirectional at high speed. In addition, in order to improve the resilience of the interconnection box 30 in case of failure, it is possible to make the switch of this box via a plurality of switches. For example, for a server 100 with eight modules, in order to interconnect the SOCs 12, 22 of the modules 1, 2, 3, the switch of the interconnect box 30 is an eight-port Ethernet switch. Such a switch can then be implemented via two five-port switches as follows: if each of the switches comprises four data downlink ports (called downlink ports) and a data amount link port (said port "Uplink"), we then connect the ports "uplink" via a printed circuit trace. Thus, the SOCs 12, 22 of the modules 1, 2, 3 can only lose all of their means of dialogue in the event of failure of all the switches, this risk diminishing proportionally in view of the number of switches used to the realization of the switch of the interconnection box 30. For the same reasons of resilience, the set of hardware components that provide the physical links between the modules are passive components. Advantageously, these passive components are chosen so as to have a "mean time between failures" MTBF (acronym for "mean time between failures") several orders of magnitude higher than the MTBF of the modules 1, 2, 3. Thus, the number 15 failures created by these components is negligible when one seeks to evaluate the availability of the server 100. Always in order to ensure good resilience to power outages, each module 1, 2, 3 includes at least two PSUs (power supply unit), which may, for example, be in the form of AC / DC converters (acronym for "Alternating Current / Direct Current") , delivering an input power of each module 1, 2, 3, for example at 12 V. Each of the PSU power supply units can moreover be dimensioned with a 2N redundancy, that is to say a doubling of their power. technical components ( eg electronic components), and can be connected to two independent AC power networks, to ensure the operation of the server 100 if one of the networks fails. The interconnection box 30 is electrically powered by the modules 1, 2, 3, each sending through the Ethernet links of the interconnection 28 a current under a predetermined voltage, for example 12V. These currents are then summed by an appropriate electronic device constituting the interconnection box 30, for example, a diode switch. In the state of the art, the PoE standard, acronym for Power-over-Ethernet, describes the power supply of devices interfaced via an Ethernet link with a switch, through the switch power supply. An electrical failure of the 3025333 13 switch therefore prevents power to the devices. In contrast to this standard, the embodiment described above makes it possible to reduce the impact of a power failure on the server 100, these being then limited to the perimeter of a possible failure of a module 1, 2, 3, such a probability of failure being reduced by the redundancy of the power supplies. Thus, all the Ethernet links made via the interconnection 27 of the modules 1, 2, 3 work together to provide a redundant power supply to the switch of the interconnection box 30, only this switch 10 comprising active components. The switch of the interconnection box 30 is therefore able to operate despite a possible failure of one or more modules 1, 2, 3. Advantageously, the power dissipated by this switch is very low (of the order of 2 at 3 Watts) and the number of components present in this switch can provide a MTBF of the order of several million hours. In addition, it is ensured to configure all the components BMC 14, 24 SOC 12, 22 to supervise the operation of this switch and handle any malfunction. Advantageously, the embodiments described above allow any module 1, 2, 3 to resist a failure of the switch of the interconnection box 30, and this despite its central position. If a failure is detected on the switch, the hot swap of this module does not interrupt the operation of the multimodular server 100.
[0008] Figure 2 illustrates the interconnection of eight modules 1, 2, 3, 4, 5, 6, 7, 8 with an interconnect box 30, according to one embodiment, and in accordance with the previously described features. In this figure, each module 1, 2, 3, 4, 5, 6, 7, 8 respectively comprises the elements mentioned above, in particular: an FPGA 13, 23, 33, 43, 53, 63, 73, 83 adapted to managing a specific clock signal 130, 230, 330, 430, 530, 630, 730, 830. The management of these signals is detailed later; a BMC component 14, 24, 34, 44, 54, 64, 74, 84 capable of managing high level functions such as instantiation, power-up, restart and instantiation of the modules 1, 2, 3, 4 , 5, 6, 7, 8. The communication elements 16, 26, the CPU processors 10, 11, 20, 21, the SOCs 12, 22 and interconnections 27, 28 are not shown in this figure in order to simplify the reading of it, but remain nevertheless present. In particular, each module 1, 2, 3, 4, 5, 6, 7, 8 is respectively interconnected by its communication element 16, 26 and its SOC 12, 22 to the other modules 1, 2, 3, 4, 5, 6, 7, 8, respectively via the interconnections 27, 28 and via the interconnection box 30. In one embodiment, the connection of each module 1, 2, 3, 4, 5, 6, 7, 8 to the interconnection box 30 is performed at the latter respectively via a synchronous data bus SPI 131, 231, 331, 431, 531, 631, 731, 831, acronym for "Serial Peripheral Interface". The data exchanges between each module 1, 2, 3, 4, 5, 6, 7, 8 and the interconnection box 30 via the SPI synchronous data buses 131, 231, 331, 431, 531, 631, 731, 831 are shown in this figure by double arrows between these 15 elements. In addition, each of the SPI synchronous data buses 131, 231, 331, 431, 531, 631, 731, 831 is respectively interfaced with a register 132, 232, 332, 432, 532, 632, 732, 832 of a device annex of the interconnection box 30, here an FPGA. Advantageously, the FPGA of the interconnection box 30 will make it possible to allocate to each SOC 12, 22 of each module 1, 2, 3, 4, 5, 6, 7, 8 an identifier during an initialization phase. . The information exchanges between the synchronous data buses SPI 131, 231, 331, 431, 531, 631, 731, 831 and the registers 132, 232, 332, 432, 532, 632, 732, 832 are here symbolized by double arrows between these elements. A phase of initialization of the modules 1, 2, 3, 4, 5, 6, 7, 8 can, by way of example, occur during a power-up, a restart of the server 100 or even when the initialization of a SOC 12, 22 by a BMC component 14, 24, 34, 44, 54, 64, 74, 84. As previously explained, an interconnection 28 of each SOC 12, 22 of each module 1, 2, 3, 4, 5, 6, 7, 8 is performed via the communication box 30 through a private network in Ethernet protocol, encapsulating a communication protocol in accordance with the IPMB standard. Each SOC 12, 22 must therefore during an initialization phase be able to start its IP connection (acronym for "Internet Protocol") via an address different from that of its neighbors. To do this, according to various embodiments, each FPGA 13, 23, 33, 43, 53, 63, 73, 83 sends via the SOC 3025333 15 12, 22 with which it is interfaced, via the interconnection 28. , an identifier request to the interconnection box 30. Each request is received by the communication box 30, at the synchronous data bus SPI 131, 231, 331, 431, 531, 631, 731, 831, this 5 last transmitting each request to the register 132, 232, 332, 432, 532, 632, 732, 832 of the FPGA with which it is interfaced. Advantageously, each request makes it possible to obtain a unique ID identifier known only from the register 132, 232, 332, 432, 532, 632, 732, 832 of the FPGA. This identifier ID is delivered by means of the signals of the bus SPI which are physically independent of the Ethernet signals (of course, the sideband signals not being an integral part of the XQPI interface). In one embodiment, each identifier ID relates to a port number, coded for example on three bits, this identifier ID 15 being specific to each physical connection location to the interconnection module 30. Advantageously, this identifier ID is unique and therefore makes it possible to subsequently identify a specific module 1, 2, 3, 4, 5, 6, 7, 8. The communication box 30 then communicates through its FPGA, an identifier ID in response to each request 20 received, this response being transmitted via the synchronous data bus SPI 131, 231, 331, 431, 531, 631, 731, 831 then by SOC 12, 22 to FPGA 13, 23, 33, 43, 53, 63, 73, 83 of module 1, 2, 3, 4, 5, 6, 7, 8 having issued the request. The identifier ID received by the FPGA 13, 23, 33, 43, 53, 63, 73, 83, is then read by the BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the SOC 12 , 22 with which it is interfaced to determine an appropriate instantiation, for example the assignment of an IP address to the module 1, 2, 3, 4, 5, 6, 7, 8. The reading of the identifier ID by each BMC component 14, 24, 34, 44, 54, 64, 74, 84 in each FPGA 13, 23, 33, 43, 53, 63, 73, 83 is symbolized in this figure by a unilateral arrow between these elements. . In one embodiment, each BMC component 14, 24, 34, 44, 54, 64, 74, 84 applies to this identifier ID an IP address calculation algorithm in order to dynamically determine and instantiate the IP address of its module. 1, 2, 3, 4, 5, 6, 7, 8. Furthermore, each FPGA 13, 23, 33, 43, 53, 63, 73, 83 is capable of reading through the synchronous data bus. SPI 131, 231, 331, 431, 531, 631, 731, 831 the number of modules 1, 2, 3, 4, 5, 6, 7, 8, 3025333 16 connected to the interconnection box. Thus, the BMC component 14, 24, 34, 44, 54, 64, 74, 84 of each module 1, 2, 3, 4, 5, 6, 7, 8 is also able to deduce the IP address of the modules 1 , 2, 3, 4, 5, 6, 7, 8 neighbors via the application of the same IP address calculation algorithm. The set of determined IP addresses can then, by way of example, be stored by 13, 23, 33, 43, 53, 63, 73, 83. In another embodiment, the IP addresses of each module 1, 2, 3, 4, 5, 6, 7, 8 are IPv4 type addresses, formed of 32 bits. It is further assumed that each IP address has twenty-nine bits of common strong weight, such a configuration being applicable in a private Ethernet network such as that of the interconnection 28. After recovering each identifier ID by each of the FPGAs 13 , 23, 33, 43, 53, 63, 73, 83, each BMC component 14, 24, 34, 44, 54, 64, 74, 84 then completes this IP address by completing the remaining three least significant bits by the three bits of the identifier ID that he read at his FPGA 13, 23, 33, 43, 53, 63, 73, 83. Thus, each component BMC 14, 24, 34, 44, 54, 64, 74, 84 can be seen as a configuration interface or dynamic reconfiguration (that is to say auto-configuration) modules 1, 2, 3, 4, 5, 6, 7, 8 20 including their instantiation , that is to say the assignment of an IP address or more generally an identifier to each of these modules 1, 2, 3, 4, 5, 6, 7, 8. Each module 1, 2, 3, 4, 5, 6, 7, 8 further comprises a clock generator capable of generating a clock signal 130, 230, 330, 430, 530, 630, 730, 830 which is specific to it. Such a clock is exemplified in the form of a square signal whose frequency is of the order of one MHz, for example 14.7 MHz, 25 MHz or 100 MHz. Advantageously, a clock signal 130, 230, 330, 430, 530, 630, 730, 830 in a module 1, 2, 3, 4, 5, 6, 7, 8 enables synchronization of all the processors CPU 10, 11, 20, 21 of this module 1, 2, 3, 4, 5, 6, 7, 8. The distribution of each clock signal 130, 230, 330, 430, 530, 630, 730, 830 to the CPU processors 10, 11, 20, 21 of each module 1, 2, 3, 4, 5, 6, 7, 8, here is represented by the unidirectional arrows to the left of the elements 130, 230, 330, 430, 530, 630, 730, 830. According to various embodiments, each clock signal 130, 230, 330, 430, 530, 630, 730, 830 can be transmitted through the interconnection 27 via the XQPI network. Advantageously, each clock signal 130, 230, 330, 430, 530, 630, 730, 830 can be filtered at the input of each module 1, 2, 3, 4, 5, 6, 7, 8 by a circuit PLL phase-locked loop electronic element, acronym for "Phase-Locked Loop", capable of eliminating any phase noise introduced by the physical links of the interconnection 27. The clock signals 130, 230, 330, 430 , 530, 630, 730, 830 can be inter-module signals, that is to say passing between different modules 1, 2, 3, 4, 5, 6, 7, 8 a synchronization of these 10 signals proves to be in addition necessary. To do this, according to various embodiments, each FPGA 13, 23, 33, 43, 53, 63, 73, 83 of each module 1, 2, 3, 4, 5, 6, 7, 8 is suitable, when a partitioning step to group all the modules 1, 2, 3, 4, 5, 6, 7, 8 according to common characteristics, for example according to the same OS operating system supported by the modules 1, 2, 3, 4, 5, 6, 7, 8, thus forming partitions. Each module 1, 2, 3, 4, 5, 6, 7, 8 therefore belongs to a set or a partition (that is to say a subset) of modules 1, 2, 3, 4, 5, 6, 7, 8, constituting the server 100. By way of example, in FIG. 2, two partitions 40, 50 are made: the first partition 40 is formed of the modules 1, 2, 3, 4, 5, 6 and the second partition 50 is formed of the modules 7, 8. Each BMC component 14, 24, 34, 44, 54, 64, 74, 84 then compares in the set or subset (i.e. the partition) to which it belongs, the number of its own module 1, 2, 3, 4, 5, 6, 7, 8 with the numbers of the modules 1, 2, 3, 4, 5, 6, 7, 8 of the same set or subset. As explained above, the number of a module 1, 2, 3, 4, 5, 6, 7, 8 and its modules 1, 2, 3, 4, 5, 6, 7, 8 neighbors is identified, when a dynamic instantiation step carried out by its BMC component 14, 24, 34, 44, 54, 64, 74, 84. Based on the result of these comparisons, each BMC 14, 24, 34, 44, 54, 64 , 74, 84 is able to identify if the module 1, 2, 3, 4, 5, 6, 7, 8 to which it belongs is a master or slave module in the set or subset of modules 1, 2, 3, 4, 5, 6, 7, 8. In one embodiment, the lowest number / identifier module 1, 2, 3, 4, 5, 6, 7, 8 is identified by its BMC 14, 24, 34, 44, 54, 64, 74, 84 as the master module, while the remaining modules are identified as slave modules. However, any other type of identification can be made, for example the master module can be identified as the module of the largest number / identifier ID, or identified according to its address. More generally, for the sake of simplification, Example 10 is now considered in which the least-numbered module 1, 2, 3, 4, 5, 6, 7, 8 in a set or subset is identified by each BMC component 14, 24, 34, 44, 54, 64, 74, 84 as the master module, while the remaining modules are identified as slave modules. According to various embodiments, if the BMC component 14, 24, 34, 44, 54, 64, 74, 84 identifies its module 1, 2, 3, 4, 5, 6, 7, 8 as a master module, the BMC component 14, 24, 34, 44, 54, 64, 74, 84 configures its FPGA 13, 23, 33, 43, 53, 63, 73, 83 so that it distributes its clock signal 130, 230, 330, 430, 530, 630, 730, 830 to the other modules 1, 2, 3, 4, 5, 6, 7, 8 of the same set or subassembly; slave, the BMC component 14, 24, 34, 44, 54, 64, 74, 84 configures its FPGA 13, 23, 33, 43, 53, 63, 73, 83 so that it deactivates the signal 130, 230, 330, 430, 530, 630, 730, 830 local to this module 1, 2, 3, 4, 5, 6, 7, 8. For example, in the same set of eight modules 1, 2, 3, 4, 5, 6, 7, 8 non-partitioned, the module 1 is identified as the master module and the modules 2, 3, 4, 5, 6, 7, 8 as slave modules. The module 1 then distributes its clock signal 130 to the modules 2, 3, 4, 5, 6, 7, 8, the latter having their clock signal 230, 330, 430, 530, 630, 730, 830 disabled. by their FPGA. The distribution of the clock signal 130 and the local deactivation of the clock signals 230, 330, 430, 530, 630, 730, 830 are here respectively performed by the FPGA 13 and the FPGAs 23, 33, 43, 53, 63 73, 83 which are respectively configured by the BMC component 14 and the BMC components 24, 34, 44, 54, 64, 74, 84. The clock is transmitted by a side-band signal, and is therefore part of interconnection 28 (ie the inter FPGA interface). the master and slave modules are identified in the same set or subset of modules 1, 2, 3, 4, 5, 6, 7, 8, according to identification information of the modules 1, 2, 3, 4 , 5, 6, 7, 8, obtained during their dynamic instantiation.
[0009] In another example illustrated in FIG. 2: the module 1 is identified as the master module in the first partition 40, the modules 2, 3, 4, 5, 6 being identified as slave modules in this partition. The module 1 then distributes 5 via its FPGA 13 (unidirectional arrows from this element), its clock signal 130 to the modules 2, 3, 4, 5, 6, the latter having deactivated locally via their FPGA 23 , 33, 43, 53, 63 their clock signal 130, 230, 330, 430, 530, 630 (unidirectional arrows in the FPGA to clock signal direction). Advantageously, the configuration of the FPGA 13 and the FPGAs 23, 33, 43, 53, 63 is carried out respectively by the BMC component 14 and the BMC components 24, 34, 44, 54, 64 (unidirectional arrows in the clock signal direction). to FPGA); The module 7 is identified as the master module in the second partition 50, the module 8 being identified as a slave module in this partition. The module 7 then distributes via its FPGA 73 (unidirectional arrow from this element), its clock signal 730 to the module 8, the latter having locally disabled its clock signal 830 via its FPGA 63 (arrow unidirectional in the FPGA direction to clock signal). Advantageously, the configuration of the FPGA 73 and the FPGA 83 is performed respectively by the BMC component 74 and the BMC component 84 (unidirectional arrow in the clock signal direction to FPGA). Each CPU processor 10, 11, 20, 21 of each module 1, 2, 3, 4, 5, 6, 7, 8 is therefore driven by a clock signal 130, 230, 330, 430, English) executed by the processor 10, 11, 20, 21. These registers are commonly referred to as timestamp counters TSC (acronym for "Time Stamp Counter"), and serve to synchronize "multi-threaded" tasks, this is i.e., comprising a plurality of light processes. The timestamp counters TSC are initialized for each processor CPU 10, 11, 20, 21 after a step 530, 630, 730, 830 common to the partition or the set of modules 1, 2, 3, 4, 5, 6 , 7, 8 to which it belongs. Advantageously, each processor CPU 10, 11, 20, 21 comprises a plurality of registers, each register relating to a light process ("thread" in initialization / reset mode ("reset" in English) being able, for example, occur when powering up / restarting a module 1, 2, 3, 4, 5, 6, 7, 8. For a successful execution of "multi-threaded" tasks, TSC timestamp counters must The synchronization of the TSC timestamp counters is advantageously effected via the clock signals 130, 230, 330, 430, 530, 630, 730, 830 at the input of the CPU 10, 11, 20 processors. Such synchronization is complex to ensure, particularly for interconnected modules 1, 2, 3, 4, 5, 6, 7, 8 each comprising a plurality of processors 10, 11, 20, 21, since each CPU processor 10, 11, 20, 21 may initially and locally include its own clock signal 130, 230, 330 , 430, 530, 630, 730, 830. The time stamping counters TSC between each module 1, 2, 3, 4, 5, 6, 7, 8 are therefore potentially time inconsistent and can, by elsewhere derive. Thus, according to various embodiments, after a possible step of assembling the modules 1, 2, 3, 4, 5, 6, 7, 8, the synchronization of timestamp counters TSC processor CPU 10, 11, 20, 21 Each of the BMC components 14, 24, 34, 44, 54, 64, 74, 84 of each master module of a set or subassembly of modules 1, 2, 3, 4, 5 is carried out as follows: 6, 7, 8 performs a configuration of the FPGA 13, 23, 33, 43, 53, 63, 73, 83 of the master module, so that the routing of any synchronization signal is limited to the slave modules of the same set or subset; each BMC component 14, 24, 34, 44, 54, 64, 74, 84 of each slave module sends, via the interconnection 28 through the IPMB network over LAN, a notification message to the BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the master module when the slave module exits an initialization or reset phase (corresponding to an initialization of the timestamp counters TSC). The BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the master module can also be informed via a notification of the initialization or reset of its own module 1, 2, 3, 4, 5, 6, 7, 8; The BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the master module waits for all the modules 1, 2, 3, 4, 5, 6, 7, 8, including itself, have left the initialization or reset phase; 5 when the notifications of all the modules 1, 2, 3, 4, 5, 6, 7, 8 have been received, the BMC 14, 24, 34, 44, 54, 64, 74, 84 of the master module triggers a reset of all the time stamp counters TSC of the CPU processors 10, 11, 20, 21 for all the modules 1, 2, 3, 4, 5, 6, 7, 8 of the same set or subassembly to which it belongs, including for its module. Advantageously, this reset is performed by sending a synchronization signal on a physical pin connection ("pin" in English) that each CPU processor 10, 11, 20, 21 of each module 1, 2, 3, 4, 5, 6, 7, 8. The sending of such a synchronization signal therefore causes a synchronous restart of all the TSC timestamp counters of the CPU processors 10, 11, 20, 21 and therefore their coherence. time, since each processor CPU 10, 11, 20, 21 then starts TSC counters driven by the same clock in all or the subset of modules 1, 2, 3, 4, 5, 6, 7, 8 Furthermore, as explained in the description of FIG. 1, each FPGA 13, 23, 33, 43, 53, 63, 73, 83 of each module 1, 2, 3, 4, 5, 6, 7, 8 comprises a data link with each of the CPU processors 10, 11, 20, 21 as well as with a communication element 16, 26 such as a BCS2. When a first error of the uncorrected, fatal or catastrophic type occurs at a CPU 10, 11, 20, 21 or a communication element 16, 26 of a module 1, 2, 3, 4, 5, 6, 7, 8, this can propagate at a very high speed, usually in a few microseconds, via the interconnection 27 through the XQPI network, to the other modules 1, 2, 3, 4, 5 , 6, 7, 8 of the same set or subassembly, generating errors in their processors CPU 10, 11, 20, 21 and their communication elements 16, 26. It is therefore necessary, in a multimodular context, to be then able to find the first error, in order to diagnose the failure. More generally, one must also be able to identify (eg: locate, date) precisely any occurrence of error in a module 1, 2, 3, 4, 5, 6, 7, 8.
[0010] Thus, according to various embodiments, for each FPGA 13, 23, 33, 43, 53, 63, 73, 83, a time stamp counter TSC is also used, carried out by a configurable size register, for example forty. bits, and synchronized via the clock signal 130, 230, 330, 430, 530, 630, 730, 830 of the master module of the set or subset of modules 1, 2, 3, 4, 5, 6 , 7, 8 to which the FPGA 13, 23, 33, 43, 53, 63, 73, 83 belongs, for example at a frequency of 25 MHz. Each of the FPGAs 13, 23, 33, 43, 53, 63, 73, 83 thus has a TSC timestamp counter perfectly synchronized with those of the FPGAs 13, 23, 33, 43, 53, 63, 73, 83 of the same set or subassembly of modules 1, 2, 3, 4, 5, 6, 7, 8. Advantageously, thanks to this synchronous time reference, each FPGA 13, 23, 33, 43, 53, 63, 73, 83 is able to date ("timestamp") and record a report about any error or event that occurs in its module 1, 2, 3, 4, 5, 6, 7, 8. It will therefore be possible at any time to reconstruct the chronology of several errors or events unambiguously on their respective order. To do this, following the time stamping by the FPGAs 13, 23, 33, 43, 53, 63, 73, 83 of events, for example errors propagated in each module 1, 2, 3, 4, 5, 6, 7, 8, 20 all the components BMC 14, 24, 34, 44, 54, 64, 74, 84 are able to exchange and share information relating to these events, in particular to find the source of these errors , that is the first error. The precise location and diagnosis of a faulty module 1, 2, 3, 4, 5, 6, 7, 8 are thus made easier. According to various embodiments, after a possible step of assembling the modules 1, 2, 3, 4, 5, 6, 7, 8, the synchronization of the timestamp counters TSC FPGA 13, 23, 33, 43, 53, 63, 73, 83, as well as the detection of a first error 30 is carried out as follows: each BMC component 14, 24, 34, 44, 54, 64, 74, 84 of each master module of a set or subset of modules 1, 2, 3, 4, 5, 6, 7, 8 performs a configuration of the FPGA 13, 23, 33, 43, 53, 63, 73, 83 of the master module, so that the routing of any synchronization signal, limited to the slave modules of the same set or the subset; Each BMC component 14, 24, 34, 44, 54, 64, 74, 84 of each slave module sends, via the interconnection 28 through the IPMB network over LAN, a notification message to the BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the master module when the slave module exits an initialization or reset phase (corresponding to an initialization of the timestamp counters TSC). The BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the master module can also be informed via a notification of the initialization or reinitialization of its own module 1, 2, 3, 4, 5, 6, 7, 8; BMC component 14, 24, 34, 44, 54, 64, 74, 84 of the master module waits for all modules 1, 2, 3, 4, 5, 6, 7, 8, including itself, to be output the initialization or reset phase; 15 - when the notifications of all the modules 1, 2, 3, 4, 5, 6, 7, 8 have been received, the BMC 14, 24, 34, 44, 54, 64, 74, 84 of the master module triggers a reset of all the TSC timestamp counters of the FPGAs 13, 23, 33, 43, 53, 63, 73, 83 of all modules 1, 2, 3, 4, 5, 6, 7, 8 of the same set or subset to which it belongs, including for itself. The timestamp counters of the various FPGAs 13, 23, 33, 43, 53, 63, 73, 83 are then perfectly synchronized; at each occurrence of an error (or event) occurring in the module 1, 2, 3, 4, 5, 6, 7, 8, for example at a processor CPU 10, 11, 20, 21 or a communication element 16, 26 (eg a BCS2), the FPGA 13, 23, 33, 43, 53, 63, 73, 83 of this module 1, 2, 3, 4, 5, 6, 7, 8 stores via its timer TSC information about the error. The information relating to the error includes, as examples, its type (ex: uncorrected, fatal), its source (eg: module number, type of component concerned: CPU or BCS), as well as its time stamp. ; each BMC component 14, 24, 34, 44, 54, 64, 74, 84 of each module 1, 2, 3, 4, 5, 6, 7, 8 then accesses the information stored by the timer TSC in the FPGA 1, 2, 3, 4, 5, 6, 7, 8 and suitably renders this information 3025333 accessible to any server administrator 100 or external tool for managing the server 100. For example, each BMC component 14, 24, 34, 44, 54, 64, 74, 84 communicates the information read into the FPGA 1, 2, 3, 4, 5, 6, 7, 8 via a log file accessible from an application external equipment management; a step of comparison between the information, for example between the time stamp, the source, the type of each error, communicated by each BMC component 14, 24, 34, 44, 54, 64, 74, 84 then makes it possible to identify a first error among a set of errors occurring in different modules 1, 2, 3, 4, 5, 6, 7, 8, as well as the chronology of the following errors. Such a comparison step may, for example, be automated or performed directly by the server administrator 100.
[0011] Advantageously, the identification, for example the determination of the chronology and the location, of the first error makes it possible subsequently to decide one or more actions to be performed on the server 100, for example restarting a module 1, 2, 3, 4, 5, 6, 7, 8 specific, or reconfigure a partition if a module 1, 2, 3, 4, 5, 6, 7, 8 therein has failed. Advantageously, the previously described embodiments allow software compatibility with standard management interfaces, for example interfaces in accordance with the IPMI standard, and whatever the number of modules 1, 2, 3, 4, 5, 6, 7, 8 constituent of the server 100. This is made possible thanks in particular to: BMC components 14, 24, 34, 44, 54, 64, 74, 84, each being usable as a data communication interface, notably allowing synchronization clocks, timestamp counters TSC, as well as the management of the high level functions, i.e. the central functions, of the different modules 1, 2, 3, 4, 5, 6, 7, 8; to the SMC components 15, 25 in charge of managing the low level functions, that is to say local functions of their own modules 1, 2, 3, 4, 5, 6, 7, 8, for example the measurement of their parameters physical and powering them on.
[0012] Thus, a great flexibility is obtained in terms of hardware and software architecture, thus facilitating any future need for modification, adaptation or evolution of the server 100. Advantageously, the embodiments described above allow the management of the server. an assembly or a subset of modules 1, 2, 3, 4, 5, 6, 7, 8 by virtue of their arrangement, as well as their clock signals 230, 330, 430, 530, 630 , 730, 830 respectively. The management of a set or a subset of modules 1, 2, 3, 4, 5, 6, 7, 8 is particularly advantageous because it allows the support of different OS according to the modules 1, 2 , 3, 4, 5, 6, 7, 8, the identification, scheduling and precise location of any possible errors for diagnostic purposes, as well as a possible dynamic reconfiguration of the partitions, especially in case of failure of a module 1, 2, 3, 4, 5, 6, 7, 8.
[0013] Another advantage of the proposed embodiments lies in the hardware realization of the server 100, and in particular, the way in which the interconnection box 30 and the modules 1, 2, 3, 4, 5, 6, 7, 8 are interconnected. and powered electrically, allows hot maintenance of each element of the server 100 as well as an improvement of its resilience. In addition, the embodiments described above for a server 100 are also transferable to any equipment or IT infrastructure comprising a plurality of modules 1, 2, 3, for example to a supercomputer. 25
权利要求:
Claims (10)
[0001]
REVENDICATIONS1. A server (100) comprising a plurality of modules (1-8), each module (1-8) comprising: a communication element (16, 26) adapted to ensure the coherence of a shared memory between the modules (1-8) ); a plurality of CPU processors (10, 11, 20, 21) connected to each other and connected to the communication element (16, 26); a system on a SOC chip (12, 22) connected to the plurality of CPU processors (10, 11, 20, 21) and to the communication element (16, 26), the system on a SOC chip (12, 22). ) running a firmware; a programmable gate network (FPGA) (13, 23, 33, 43, 53, 63, 73, 83) connected to the system on a SOC chip (12, 22), to the communication element (16, 26) and to the plurality of CPUs (10, 11, 20, 21); the modules (1-8) being interconnected by: - an interconnection (27) between each communication element (16, 26) via an XQPI network; an interconnection (28) between each system on a SOC chip (12, 22) via a private Ethernet protocol network, encapsulating a communication protocol in accordance with the IPMB standard; the firmware executed on each system on a SOC chip (12, 22) of each module (1-8) implementing two software components, namely: - a system management satellite controller component (SMC) (15, 25) adapted to measure the physical parameters of its module (1-8) and manage the local functions of this module (1-8); a base card management controller (BMC) component (14, 24, 34, 44, 54, 64, 74, 84) capable of supervising all the SMCs (15, 25) of all the modules (1- 8), centrally managing all functions of the server (100) and exchanging data with each of the BMC components (14, 24, 34, 44, 54, 64, 74, 84) via the interconnection (28) between each system on a SOC chip (12, 22).
[0002]
The server (100) according to claim 1, wherein the interconnection (27) of the communication elements (16,26) and the interconnection (28) of the SOCs (12,22) are realized via the intermediary an interconnection box (30), the interconnection box (30) comprising a programmable gate array (FPGA).
[0003]
The server (100) according to claim 2, wherein the interconnect box (30) is electrically powered by the modules (1-8), each module (1-8) comprising at least two PSU power supplies. sized with 2N redundancy.
[0004]
The server (100) according to any one of claims 1 to 3, wherein one of the functions managed by each BMC component (14, 24, 34, 44, 54, 64, 74, 84) is a function allowing instantiation of the modules (1-8), this function being performed in the following manner: sending an ID identifier request by the FPGA (13, 23, 33, 43, 53, 63, 73, 83) of each module (1-8) to the interconnect box (30) via the system on a SOC chip (12, 22); - sending by the FPGA of the interconnection box (30) a unique identifier ID to the FPGA (13, 23, 33, 43, 53, 63, 73, 83) in response to the identifier request; determination by the BMC component (14, 24, 34, 44, 54, 64, 74, 84) of the address of its module (1-8) and the addresses of the modules (1-8) to which it is interconnected , the determination of these addresses being performed according to the identifier ID received by the FPGA (13, 23, 33, 43, 53, 63, 73, 83).
[0005]
The server (100) according to claim 4, wherein the identifier ID sent by the FPGA of the interconnect box (30) is determined according to each physical connection location of each of the modules (1-8) to interconnection module (30).
[0006]
The server (100) according to any one of claims 1 to 5, wherein each of the modules (1-8) comprises a clock generator (130, 230, 330, 430, 530, 630, 730, 830), adapted to synchronize the CPU processors (10, 11, 20, 21) of its module (1-8); its FPGA (13, 23, 33, 43, 53, 63, 73, 83) programmed to partition and group the modules (1-8) into subsets according to common characteristics. 3025333 28
[0007]
The server (100) according to claim 6, wherein each BMC component (14, 24, 34, 44, 54, 64, 74, 84) is programmed to: identify in a set or subset of modules (1 8) belonging to a master or slave module, as a function of identification information of each of the modules (1-8) of this set or subset; if it belongs to a master module, configure its FPGA (13, 23, 10 even together or subassembly, if it belongs to a slave module, configure its FPGA (13, 23, 33, 43, 53, 63, 73, 83 ) so that the FPGA (13, 23, 33, 43, 53, 63, 73, 83) disables the clock signal (130, 230, 330, 430, 530, 630, 730, 830) of the slave module .
[0008]
8. Server (100) according to claim 7, wherein each processor CPU (10, 11, 20, 21) of each module (1-8) comprises timestamp counters TSC, able to synchronize tasks comprising a plurality of light processes, the synchronization of all of the TSC timestamp counters of these CPUs (10, 11, 20, 21) in a set or subset of the module (18) being performed by: sending by each BMC component (14, 24, 34, 44, 54, 64, 74, 84) of each slave module of this set or subassembly, of a BMC notification (14, 24, 34, 44, 54, 64, 74, 84) of the master module, when the slave module leaves an initialization or reset phase; a notification to the BMC (14, 24, 34, 44, 54, 64, 74, 84) of the master module, when the master module exits an initialization or reset phase; the sending by the BMC (14, 24, 34, 44, 54, 64, 74, 84) of the master module, when all the master and slave modules are out of an initialization or reset phase, a synchronization signal capable of resetting all the timestamp counters 35 of the CPU processors (10, 11, 20, 21) for all the modules (1-8) of the same set or subassembly. 33, 43, 53, 63, 73, 83), so that the FPGA (13, 23, 33, 43, 53, 63, 73, 83) distributes the clock signal (130, 230, 330, 430, 530, 630, 730, 830) from the master module to the slave modules of 3025333 29
[0009]
The server (100) according to claim 7 or 8, wherein each FPGA (13, 23, 33, 43, 53, 63, 73, 83) of each module (1-8) comprises TSC timestamp counters, adapted at any occurrence of error in its module (1-8) to record information relating to the error including at least the time stamp of the error, the synchronization of the set of timestamp counters TSC of each FPGA (13, 23, 33, 43, 53, 63, 73, 83) in a module assembly or subassembly (1-8) being provided by the sending by each BMC component (14, 24, 34). , 44, 54, 64, 74, 84) 10 of each slave module of this set or subassembly, of a notification to the BMC (14, 24, 34, 44, 54, 64, 74, 84) of the master module when the slave module exits an initialization or reset phase; a notification to the BMC (14, 24, 34, 44, 54, 64, 74, 84) of the master module, when the master module exits an initialization or reset phase; the sending by the BMC (14, 24, 34, 44, 54, 64, 74, 84) of the master module, when all the master and slave modules are out of an initialization or reset phase, a synchronization signal adapted to reset all the timestamp counters TSC FPGA (13, 23, 33, 43, 53, 63, 73, 83) for all the modules (1-8) of the same set or sub- together.
[0010]
The server (100) of claim 9, wherein identifying an error in a set or subset of modules (1-8) is performed by a step of comparing the recorded error information. by all the TSC timestamp counters of the FPGAs (13, 23, 33, 43, 53, 63, 73, 83) of this set or subassembly, the BMCs (14, 24, 34, 44, 54, 64, 74, 84) being able to exchange and share error information in the same set or subset of modules (1-8).
类似技术:
公开号 | 公开日 | 专利标题
EP2998877A2|2016-03-23|Server comprising a plurality of modules
US11194635B2|2021-12-07|Orchestration service for a distributed computing system
US9507566B2|2016-11-29|Entropy generation for a distributed computing system
US20210143999A1|2021-05-13|Methods and apparatus to manage credentials in hyper-converged infrastructures
KR100827027B1|2008-05-06|Device diagnostic system
US6973517B1|2005-12-06|Partition formation using microprocessors in a multiprocessor computer system
US6681282B1|2004-01-20|Online control of a multiprocessor computer system
JP6559842B2|2019-08-14|Multi-node system fan control switch
US10516734B2|2019-12-24|Computer servers for datacenter management
EP2729874A1|2014-05-14|Method and computer program for the dynamic management of services in an administration cluster
US10409940B1|2019-09-10|System and method to proxy networking statistics for FPGA cards
US11099827B2|2021-08-24|Networking-device-based hyper-coverged infrastructure edge controller system
US11201785B1|2021-12-14|Cluster deployment and management system
US10783109B2|2020-09-22|Device management messaging protocol proxy
US10713138B2|2020-07-14|Failure detection for central electronics complex group management
US20210157609A1|2021-05-27|Systems and methods for monitoring and validating server configurations
US20220011968A1|2022-01-13|Mapping of raid-cli requests to vsan commands by an out-of-band management platform using nlp
US20160036650A1|2016-02-04|Microcontroller at a cartridge of a chassis
CN111949320A|2020-11-17|Method, system and server for providing system data
FR2984053A1|2013-06-14|METHOD AND COMPUTER PROGRAM FOR MANAGING MULTIPLE FAILURES IN A COMPUTER INFRASTRUCTURE COMPRISING HIGH AVAILABILITY EQUIPMENT
同族专利:
公开号 | 公开日
FR3025333B1|2017-12-08|
JP2016045968A|2016-04-04|
JP6409229B2|2018-10-24|
EP2998877A2|2016-03-23|
US9934183B2|2018-04-03|
US20160062936A1|2016-03-03|
EP2998877A3|2016-08-03|
BR102015020326A2|2017-05-30|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题
US6891397B1|2003-04-21|2005-05-10|Xilinx, Inc.|Gigabit router on a single programmable logic device|
US20080313312A1|2006-12-06|2008-12-18|David Flynn|Apparatus, system, and method for a reconfigurable baseboard management controller|
US8595550B1|2011-03-30|2013-11-26|Google Inc.|Back-up power for a network switch|
TWI244594B|2004-07-13|2005-12-01|Quanta Comp Inc|Method for automatically assigning the address of communication ports and a blade server system|
WO2006015366A2|2004-07-31|2006-02-09|Server Technology, Inc.|Transfer switch with arc suppression|
CN1863081B|2005-10-14|2010-05-05|华为技术有限公司|Managing system and method based on intelligent platform managing interface|
FR2898753B1|2006-03-16|2008-04-18|Commissariat Energie Atomique|SEMI-DISTRIBUTED CONTROL CHIP SYSTEM|
US8036247B2|2007-01-05|2011-10-11|Frank Paul R|System and method of synchronizing real time clock values in arbitrary distributed systems|
US7840656B2|2008-04-30|2010-11-23|International Business Machines Corporation|Policy control architecture for blade servers upon inserting into server chassis|
US7788363B2|2008-07-15|2010-08-31|Unisys Corporation|Secure communication over virtual IPMB of a mainframe computing system|
US8201009B2|2009-07-14|2012-06-12|T-Win Systems, Inc.|Computer management and power backup system and device|
US20130080754A1|2011-09-22|2013-03-28|Cisco Technology, Inc.|Service Profile Based Peripheral Component Interconnect Device Enumeration And Option ROM Loading|
US8832473B2|2012-05-24|2014-09-09|Mitac International Corp.|System and method for activating at least one of a plurality of fans when connection of a computer module is detected|
US20140115137A1|2012-10-24|2014-04-24|Cisco Technology, Inc.|Enterprise Computing System with Centralized Control/Management Planes Separated from Distributed Data Plane Devices|
US9229497B2|2012-11-08|2016-01-05|Silicon Graphics International Corp.|On-blade cold sink for high-density clustered computer system|
US9367419B2|2013-01-08|2016-06-14|American Megatrends, Inc.|Implementation on baseboard management controller of single out-of-band communication access to multiple managed computer nodes|
US9529583B2|2013-01-15|2016-12-27|Intel Corporation|Single microcontroller based management of multiple compute nodes|
IN2013CH05264A|2013-05-01|2015-05-29|Wyse Technology Llc|
US8924899B2|2013-05-23|2014-12-30|Daniel Jakob Seidner|System and method for universal control of electronic devices|US11055252B1|2016-02-01|2021-07-06|Amazon Technologies, Inc.|Modular hardware acceleration device|
US11144496B2|2016-07-26|2021-10-12|Samsung Electronics Co., Ltd.|Self-configuring SSD multi-protocol support in host-less environment|
US10372659B2|2016-07-26|2019-08-06|Samsung Electronics Co., Ltd.|Multi-mode NMVE over fabrics devices|
US20180074984A1|2016-09-14|2018-03-15|Samsung Electronics Co., Ltd.|Self-configuring baseboard management controller |
US10346041B2|2016-09-14|2019-07-09|Samsung Electronics Co., Ltd.|Method for using BMC as proxy NVMeoF discovery controller to provide NVM subsystems to host|
US10496566B2|2016-12-20|2019-12-03|Samsung Electronics Co., Ltd.|Method and apparatus for data recovering during a board replacement|
CN108289041B|2018-01-25|2022-02-22|郑州云海信息技术有限公司|Server information processing method and related device|
US10908940B1|2018-02-26|2021-02-02|Amazon Technologies, Inc.|Dynamically managed virtual server system|
FR3078799B1|2018-03-12|2021-06-04|Bull Sas|MANAGEMENT OF CONFIGURATION DATA FOR A MULTIMODULE SERVER|
CN109298660A|2018-08-14|2019-02-01|华东计算技术研究所(中国电子科技集团公司第三十二研究所)|A kind of control system of Satellite Payloads|
TWI675288B|2018-09-21|2019-10-21|神雲科技股份有限公司|Server rack|
CN109709918A|2018-12-25|2019-05-03|山东华宇航天空间技术有限公司|A kind of satellite intelligence production visualization managing and control system|
法律状态:
2015-07-27| PLFP| Fee payment|Year of fee payment: 2 |
2016-03-04| PLSC| Publication of the preliminary search report|Effective date: 20160304 |
2016-07-20| PLFP| Fee payment|Year of fee payment: 3 |
2017-07-20| PLFP| Fee payment|Year of fee payment: 4 |
2018-09-28| PLFP| Fee payment|Year of fee payment: 5 |
2019-08-27| PLFP| Fee payment|Year of fee payment: 6 |
2020-08-24| PLFP| Fee payment|Year of fee payment: 7 |
2021-11-26| PLFP| Fee payment|Year of fee payment: 8 |
优先权:
申请号 | 申请日 | 专利标题
FR1401900A|FR3025333B1|2014-08-26|2014-08-26|SERVER COMPRISING A PLURALITY OF MODULES|FR1401900A| FR3025333B1|2014-08-26|2014-08-26|SERVER COMPRISING A PLURALITY OF MODULES|
EP15178193.7A| EP2998877A3|2014-08-26|2015-07-24|Server comprising a plurality of modules|
US14/824,223| US9934183B2|2014-08-26|2015-08-12|Server comprising a plurality of modules|
BR102015020326A| BR102015020326A2|2014-08-26|2015-08-24|server comprising a plurality of modules|
JP2015166407A| JP6409229B2|2014-08-26|2015-08-26|Server with multiple modules|
[返回顶部]